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Abstract 

Wc present a new bound for suprema of a special type of ehaos processes indexed by 
a set of matrices, which is based on a chaining method. As applications we show signifi- 
cantly improved estimates for the restricted isometry constants of partial random circulant 
matrices and time-frequency structured random matrices. In both cases the required con- 
dition on the number m of rows in terms of the sparsity s and the vector length n is 
m > s log^ s log^ n. 
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1 Introduction and Main Results 

1.1 Compressive Sensing 

Compressive sensing [71 1131 1181 137j is a method aimed at recovering sparse vectors from 
highly incomplete information using efficient algorithms. This discovery has recently trig- 
gered various applications in signal and image processing. 

To formulate the procedure, a vector x G C" is called s-sparse if ||a;||o := \{£ : X( 0}\ < 
s. Given a matrix $ G (j^mxn^ called the measurement matrix, the task is to reconstruct x 
from the linear measurements 

y = ^x. 

We are interested in the case m ^ n, so that this system is under-determined, and thus, 
without additional information it is impossible to reconstruct x. On the other hand, if it 
is known a priori that x is s-sparse then the situation changes. And, although the naive 
approach for reconstruction, namely, iQ-mimmizSitio'ii, 

min||2;||o subject to $2: = y 

is NP-hard in general, there are several tractable alternatives - for instance, i!i-minimization 

miiiiiz] 

min||2;||i subject to $z = y, 
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(where ||2;||p denotes the usual ^p-norm) which is a convex optimization problem and may 
be solved efhciently. 

The restricted isometry property streamlines the analysis of recovery algorithms. For a 
matrix $ S (^rnxn g^j-^j s < n, the restricted isometry constant Ss is defined as the smallest 
number such that 

(1 - (5s)||a;||2 < Il*a5|l2 < (1 + '5s)lla5|l2 foi' all s-sparse x. 

One may show that under conditions of the form S^s < S* for some (5* < 1 and some 
appropriate small integer k, a variety of recovery algorithms reconstruct every s-sparse 
X from y = ^x. Among these are i!i-minimization as mentioned above [H [191 EJ, or- 
thogonal matching pursuit [57] , CoSaMP [551 US] : iterative hard thresholding [5] and hard 
thresholding pursuit [20]. 

Remarkably, all optimal measurement matrices known so far are random matrices. For 
example, a Bernoulli random matrix # £ jjmxn j^g^g entries $jfc = £jk/^/m, where the 
Sjk are independent, symmetric {— 1, l}-valued random variables. Its restricted isometry 
constant satisfies Ss < S with probability at least 1 — rj provided that 

m > C(5"^(sln(en/.s) + \n{r]^'^)), 

where C is an absolute constant [9l [SO] [3] . 

In practice, structure is an additional requirement on the measurement matrix In- 
deed, certain applications impose constraints on the matrix and recovery algorithms can 
be accelerated when fast matrix vector multiplication routines are available for Unfor- 
tunately, a Bernoulli random matrix does not possess any structure. This motivates the 
study of random matrices with more structure. Also, structured random matrix construc- 
tions usually involve a reduced degree of randomness. For example, partial random Fourier 
matrices # £ ^mxn ^j-jg^ j-andom row submatrices of the discrete Fourier matrix and 
their restricted isometry constants satisfy 6s < S with high probability provided that 

m > CS^^s \og^ s log n, 

see [9l[41]. 

This article provides a similar estimate for two further types of structured random 
matrices, namely partial random circulant matrices and time-frequency structured random 
matrices. The key proof ingredients will be new estimates for suprcma of chaos processes 
of a certain type. 

1.2 Partial random circulant matrices 

Circulant matrices are connected to circular convolution, defined for two vectors x,z E C" 

by 

n 

{z*x)j -.^^ZjQkXk, j = l,...,n, 

k=l 

where j©fc ^ j~k mod n is the cyclic subtraction. The circulant matrix H = Hz G C"^" 
associated with z is given by Hx = z * x and has entries Hjk = ZjQk- 

We are interested in sparse recovery from subsampled convolutions with a random 
vector. Formally, let Q, C {l,...,n} be an arbitrary (fixed) set of cardinality m, and 
denote by Rn ■ C" — > C™ the operator that restricts a vector a; e C" to its entries 
in rt. Let e = (ei)"=i be a Rademacher vector of length n, i.e., a random vector with 
independent entries distributed according to P(£i = ±1) — ^. Then the associated partial 



2 



random circulant matrix is given hy ^ ~ m ^/^R^^Hs <E M'"^" and acts on vectors x <E C" 
via ^ 

= —;=RQ{e * x). 

In other words, $ is a circulant matrix generated by a Rademacher vector, where the rows 
outside r2 are removed. Our first main result establishes the restricted isomctry property 
of $ in a near-optimal parameter regime: 

Theorem 1.1 Let $ G jjmxn ^ draw of a partial random circulant matrix generated by 
a Rademacher vector e. If 

m > c5-^s (log^ s)(log^ n), (1.1) 

then with probability at least 1 — 77,- (log n) (log s) ^ restricted isometry constant of ^ sat- 
isfies Ss < S. The constant c > is universal. 

In Section m we will prove a more general version of this theorem, just requiring that 
the generating random variable is mean-zero, variance one, and subgaussian. These results 
improve the best previously known estimates for a partial random circulant matrix |38j . 
namely that m > 6*5(5 log n)'^/^ is a sufficient condition for achieving Ss < S with high 
probability (see also [22] for an earlier work on this problem). In particular, Theorem ll.il 
removes the exponent 3/2 of the sparsity s, which was already conjectured in j38j to be an 
artefact of the proof. 

A related non-uniform recovery result is contained in [361 137] where one considers the 
probability that a fixed s-sparse vector is reconstructed via ^i-minimization using a draw of 
a partial random circulant matrix. The condition derived there is m > Cs log^ n, which is 
slightly better than (|1.1|) . However, the statement of Theorem 1 1.1 1 is considerably stronger 
because it implies uniform and stable recovery of all s-sparse vectors via £i-minimization 
and other recovery methods for a single matrix 

Note that in [31], the restricted isometry property has been established for partial 
random circulant matrices with random sampling sets and random generators under the 
condition m > Cs log^ n. In contrast, our result holds for an arbitrary fixed selection of a 
set Q, C {1, . . . , n}, which is important in applications since in many practical problems, it 
is natural or desired to consider structured sampling sets such as $7 = {L, 2L, 3L, . . . , mL\ 
for some X € N; these sets are clearly far from being random. 

Potential applications of compressive sensing with subsamplcd random convolutions 
include system identification, radar and cameras with coded apcraturc. We refer to |221 
[39l [38] for a discussion on these applications. 

Combining our result with the work |26j on the relation between the restricted isomctry 
property and the Johnson-Lindenstrauss lemma we also obtain an improved estimate for 
Johnson-Lindenstrauss embeddings arising from partial random circulant matrices, see also 
[24l [46] for earlier work in this direction. 

Theorem 1.2 Fix rj,d € (0, 1), and consider a finite set E C M" of cardinality \E\ = p. 
Choose 

m > Ci^-2log(C2p)(loglog(C2p))'(logn)2, 

where the constants Ci,C2 depend only on rj. Let $ G C™^" be a partial circulant matrix 
generated by a Rademacher vector e. Furthermore, let e' £ be a Rademacher vector in- 
dependent of e and set D^^i to be the diagonal matrix with diagonal e' . Then with probability 
exceeding 1 — rj, for every x G E, 

il-S)\\x\\l<\\^D,,x\\l<il + S)\\x\\l 
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1.3 Time-frequency structured random matrices 



The translation and modulation operators on C™ are defined by {Th)j = hjQi and 
{Mh)j = e-^^^^^"^hj = uj^hj, where w = e^'^'/™ and G again denotes cyclic subtraction, 
this time modulo m. Observe that 

iT''h)j = hjek and (M^h.)^ = e^'^*^^/"/!^ = Lu'^^hj. (1.2) 

The time-frequency shifts arc given by 

7r(A) = M^^^ X^{k,£) e = {0,...,™- 

For h <E C" \ {0} the system {7t{X)h : A E Z^}, is called a Gabor system [H [2TJ [25] , and 
the m X matrix whose columns are the vectors 7r(A)/i, A G is called a Gabor 
synthesis matrix, 

[7r(A)/i],^^^ eC"><™^ 

Note that here the signal length n is coupled to the embedding dimension m via n = 
(so that logn = 21ogm below). 

Our second main result establishes the restricted isometry property for Gabor synthe- 
sis matrices generated by a random vector. The following formulation again focuses on 
normalized Radcmacher vectors, postponing a more general version of our results until 
Section [S] 

Theorem 1.3 Let e be a Rademacher vector and consider the Gabor synthesis matrix 
*h e C™^™' generated by h = -^e. If 

m > c5"^s (log s)2 (log to)^, (1.3) 

then with probability at least 1 — TO-(iog™) (iog s)^ ^/jg restricted isometry constant of 
satisfies Sg < S. 

Again, Theorem 11.31 improves the best previously known estimate from [M] , in which the 
sufficient condition of m > Cs^^^ log^ m was derived. In particular, it implies the first 
uniform sparse recovery result with a linear scaling of the number of samples m in the 
sparsity s (up to log-factors). 

A non-uniform recovery result for Gabor synthesis matrices with Steinhaus generator 
(see Section [2] for the definition) appears in |32| , where it was shown that a fixed s-sparse 
vector is recovered from its image under a random draw of the m x Gabor synthesis 
matrix via ^i-minimization with high probability provided that m > Cs log m. Again, the 
conclusion of Theorem 11.31 is stronger than this previous result in the sense that it implies 
uniform and stable s-sparsc recovery. Further related material may be found in |331 [2] . 

Applications of random Gabor synthesis matrices include operator identification (chan- 
nel estimation in wireless communications) , radar and sonar [5J [531 [33] • 

1.4 Suprema of chaos processes 

Both for partial random circulant matrices and for time-frequency structured random ma- 
trices generated by Rademacher vectors, the restricted isometry constants 5s can be written 
as a random variable X of the form 

X= sup lllAell^-EllAell^l, (1.4) 
AeA 

where .A is a set of matrices and e is a Rademacher vector. Due to the identity (|1.7p below, 
X is the supremum of a chaos process. 



4 



Our third main result - the main ingredient of the proofs of Theorems 11.11 and 11.31 
but also of independent interest - provides expectation and deviation bounds for random 
vectors X of this form in terms of two types of complexity parameters of the set of matrices 
A. The first one, denoted by dpiA) and d2^2{A), is the radius of ^ in the Frobenius norm 
\\A\\f = y/tr{A*A) and the operator norm ||Aj|2^2 = sup||^||2<i ||Aa;||2, respectively. 

That is, dpiA) = sup || and (i2-»2(-4) = sup ||Aj|2^2- For the second one, Talagrand's 

AeA Agy i 
functional 72 (-A, || • ||2^2), we refer to Definition 12.11 below for a precise description. 
With these notions, our result reads as follows. 

Theorem 1.4 Let A C C™^" be a symmetric set of matrices, A — —A. Let e be a 
Rademacher vector of length n. Then 

Esup |||Ae||2-E||Ae||^| <Ci (d^ (^)72(A !1 • 112^2) + 72(A II • l|2^2)') ^-.CiE. (1.5) 
AeA 



Furthermore, for t > 0, 



sup \\\Ae\\l-E\\Ae\\l\>C2E + t] < 2 exp ( -C3 min <j A k 1 (ij) 



AeA 



where 

V = d2^2 {A){-f2{A,\\ -112^2) + dFiA)) and U ^ dl^2iA). 
The constants Ci,C2,C^ > aj'e universal. 

The symmetry assumption A ~ —A was made for the sake of simplicity. The more general 
Theorem 13.11 below does not use this assumption but requires an additional term on the 
right hand side of the estimate. Furthermore, Theorem 13.11 will actually be stated under 
more general conditions on the generating random vector. 

Let us relate our new bound to previous estimates. By expanding the £2-norms we can 
rewrite X in (11.41) as 



X ~ sup 
AeA 



(1.7) 



which is a homogeneous chaos processes of order 2 indexed by the positive semidefinite 
matrices A* A. Talagrand [44] considers general homogeneous chaos process of the form 



Y = sup 
BeB 



where B C C"'^" is a set of (not necessarily positive semidefinite) matrices. He derives the 
bound 

EY < Ci72(S, II -11^) + C27i(S, II • 112^2) (1.8) 

(see Section [5] for the definition of the 7Q,-functional). This estimate was an essential 
component in the proofs of the previous bounds for the restricted isometry constants of 
partial random circulant matrices |38| and of random Gabor synthesis matrices [34j . In 
fact, the appearance of the 71-functional leads to the non-optimal exponent 3/2 in the 
sparsity s in the estimate of the required number m of samples. In contrast, as our bound 
for the chaos at hand does not involve the 71-functional but only the 72-functional, this 
issue does not arise here. 

Remark. The benchmark problems of estimating the singular values and the restricted 
isometry constant of a Bernoulli matrix (with independent ±1 entries) can also be recast as 
a supremum of chaos processes of the form (|1.4p . The bounds resulting from Theorem 1 1.41 
are then optimal up to a constant factor. Again, we are not aware of a way to deduce such 
bounds from p.Sp . 
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2 Preliminaries 
2.1 Chaining 

The following definition is due to Talagrand [44] and forms the core of the generic chaining 
methodology. 

Definition 2.1 For a metric space {T,d), an admissible sequence of T is a collection of 
subsets of T , {Tg : s > 0}, such that for every s > 1, \Ts\ < 2^ and \Tq\ = 1. For {3 > 1, 
define the "fp functional by 

oo 

7/3(T,d) = infsupV2^/^d(f,r,), 

where the infimum is taken with respect to all admissible sequences of T . 

Recall that for a metric space (T, d) and u > 0, the covering number N{T,d,u) is the 
minimal number of open balls of radius u in (T, d) needed to cover T. The 7Q,-functionals 
can be bounded in terms of such covering numbers by the well-known Dudley integral (see, 
e.g., [HI). A more specific formulation for the 72-functional of a set of matrices A equipped 
with the operator norm, the scenario which we will focus on in this article, is 

72(A II • 112^2) < c / ^\og N {A, \\ -112^2, u)du (2.1) 

This type of entropy integral was introduced by Dudley |15j to bound the supremum of 
Gaussian processes, and was extended by Pisicr |35| as a way of bounding processes that 
satisfy different decay properties. 

When considered for a set T C £2, 72 has close connections with properties of the 
canonical Gaussian process indexed by T; we refer the reader to [Ml |44] for detailed expo- 
sitions on these connections. One can show that under mild measurability assumptions, if 
{Gt : t G T} is a centered Gaussian process indexed by a set T, then 

ci72(T,d) < EsupGt < C2j2{T,d), (2.2) 
teT 

where Ci and C2 are absolute constants, and for every s,t E T, d'^{s,t) = E\Gs — Gtp. 
The upper bound is due to Fernique [T7] and the lower bound is Talagrand's majorizing 
measures theorem [42j |44] . 
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2.2 Subgaussian random vectors 

In this section, we will discuss different classes of random vectors that are needed in the 
formulation of the main results in a more general framework. In the following definition, 
S*""^ denotes the unit sphere in R" (resp. in C"). 

Definition 2.2 A mean-zero random vector X on C" is called isotropic if for every 
9 £ S*"^^, E|<^X, 6')p = 1. A random vector X is called L-subgaussian if it is isotropic 
and P{\{X,0)\ >t)< 2exp(-tV2i^) V every G S"""!, and any t > 0. 

It is well known that, up to an absolute constant, the tail estimates in the definition of 
a subgaussian random vector are equivalent to the moment characterization 

sup {E\{X,e)\pf'' <^L. (2.3) 

Assume that a random vector ^ has independent coordinates ^i, each of which is an 
L-subgaussian random variable of mean zero and variance one. One may verify by direct 
computation that ^ is L-subgaussian. Rademacher vectors, standard Gaussian vectors, 
(that is, random vectors with independent normally distributed entries of mean zero and 
variance one), as well as Steinhaus vectors (that is, random vectors with independent 
entries that are uniformly distributed on {z G C : \z\ = 1}), are examples of isotropic, 
L-subgaussian random vectors for an absolute constant L. 

We will require the following well-known bound relating strong and weak moments. For 
convenience, a proof based on chaining and the majorizing measures theorem is provided 
in the appendix. 

Theorem 2.3 Let Xi,...,Xn G and T C C^. If $, is an isotropic, L-subgaussian 
random vector and Y ~ Sj=i ^j^j' then for every p > I, 

(Esup\{t,Y)A <c(Esxip\{t,G)\+sxipiE\{t,Y)\Py/p) , (2.4) 
V teT J \ teT teT J 

where c is a constant which depends only on L and G ~ T^^=i9i^j /'^^ ffi' ■ • ■ t9n inde- 
pendent standard normal random variables. 

Note that if || • |j is some norm on and i?* is the unit ball in the dual norm of || • | 
then the above theorem implies that 

{E\\Y\\Pf'P < c (e\\G\\ + sup {E\{t,Y)\P)'/p] . 

\ teB, J 

In the remainder of this article, we will state and prove generalizations of our main 
results Thcorem ll.il Theorem 11.31 and Theorem 11.41 to arbitrary isotropic vectors, whose 
coordinates are independent L-subgaussian random variables. Since a Rademacher vector 
has all these properties, the above formulations of our results will directly follow. 

2.3 Further probabilistic tools 

The following decoupling inequality is a slight variation of a result found for instance in 
[12], see also [6l[37]. 
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Theorem 2.4 Let ^ = (^i,---,Cti) be a sequence of independent, centered random vari- 
ables, and let F be a convex function. If B is a collection of matrices and ^' is an indepen- 
dent copy of ^, then 



E sup F 
BeB 



< E sup 4 V ^,aB,,k 



J 



BeB 



(2.5) 



We require also a slightly stronger decoupling inequality which is valid in the Gaussian 
case and follows from specifying results from [1] Section 2] to an order 2 Gaussian chaos. 

Theorem 2.5 There exists an absolute constant C such that the following holds for all 
p > 1. Let g ~ (gi, . . . , gn) be a sequence of independent standard normal random variables. 
If B is a collection of Hermitian matrices and g' is an independent copy of g, then 



E sup I ^ gjgkBj^k + - 1)^^- / < C^E sup 



BeB 



BeB 



XI 93g'kB],k\^. 

j,k=l 



(2.6) 



Since some steps in our estimates are formulated in terms of moments, the transition 
to a tail bound can be established by the following standard estimate, which easily follows 
from Markov's inequality. 

Proposition 2.6 Suppose Z is a random variable satisfying 

{E\Z\Py/P <a + l3^ + -fp for all p> pa 
for some a,/3,7,po > 0. Then, for u >pa, 

F{\Z\ > e(a + + ju)) < e"" . 

2.4 Notation 

Absolute constants will be denoted by ci, C2, . . .; their value may change from line to line. 
We write A < B ii there is an absolute constant ci for which A < ciB. A ^ B means 
that ciA < B < C2A for absolute constants ci and C2. If the constants depend on some 
parameter r we will write A <r B or A B. 

The Lp-norm of a random variable, or its p-th moment, is given by H^H^, = (ElXj^*)^^^. 
For a random variable X independent from all other random variables which appear, we 
denote the expectation and probability conditional on all variables except X by Ex and 
Px , respectively. The canonical unit vectors in C" are denoted ej and B2 is the unit f 2-ball 
in C". 

Finally, we introduce shorthand notations for some quantities that we will study. To 
that end, let Ahe & set of matrices on M" or on C" and set a random vector ^ = (^i)"^i. 
For a given matrix A, denote its j-th column by and set 



iV^(0 := sup IIACII2, 
AeA 



Ba{C) ■= sup 
AeA 



j.k=l 
Jy^k 



Da{$,) sup 
AeA 



and C^(0 sup ||A|||^-E||AC||^ 
AeA 
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3 Chaos processes 

We are now well-equipped to prove the following generalized version of Theorem 11.41 

Theorem 3.1 Let A be a set of matrices, and let ^ be a random vector whose entries 
are independent, mean-zero, variance 1, and L-subgaussian random variables. Set 



E = 72(A !1 • 112^2) (72(A II • 112^2) + dF{A)) + dF{A)d2^2{A), 
V ^d2^2{A){-i2{A\\-\\2^2) + dF{A)), and U = dl_^^{A). 



Then, for t > 0, 

V ( r . ( f \ ■> P. ^ +\ < r^vT^ I min J _ 



{CA{i) >ciE + t)<2 exp ( -C2 min <j ^ 



The constants Ci,C2 depend only on L. 

Remark 3.2 Theorem \3.1\ directly implies the tail estimate of Theorem \1.4\ Indeed, the 
symmetry assumption A = —A ensures that d2~i.2{A) < 72 (-4, || • ||2-s.2)- The estimate for 
the expectation in Theorem \ 1 .4\ follows from Theorem \3.4\ below by choosing p = 1. 

The proof is based on estimating the moments of the random variables Nj\^ and CU, 
followed by applying Proposition 12.61 The first step is a bound on the moments of a 
decoupled version of . 

Lemma 3.3 Let A be a set of matrices, let^ = be an L-subgaussian random vector, 

and let be an independent copy of ^. Then for every p > 1, 



sup (At A^') 

AeA 



<L l2iA, II • ||2^2)I1A^^(€)||l, + sup ||(At A|')1Il, 

AeA 



Proof. The proof is based on a chaining argument. Since the space involved is finite 
dimensional, one may assume without loss of generality that A is finite. Fix an admissible 
sequence (T^) of A, let tt^A = argmin^gj^ ||S — j4||2-i.2 and set ArA = ix^A — iTr-iA. 
Since A is finite, there is some ro for which |^| < 2^ ". Given p > I, let £ be the largest 
integer for which 2^ < p, and we may assume that £ < ro as the modifications needed when 
£ > ro are minimal. 

Note that for every A £ A, 

\{AtA^')^\{{7TeA)t{n,A)^')\ 

rp- l rp- l 

< J2 |((A.+iA)|,K+iA)Ol+ E l(K^)e(A,.+iA)|')l 

Furthermore, conditionally on 

{{Ar+iA)i, K+iA)^') = {i, (A,.+iA)*KA)^') 
is a subgaussian random variable, as for every u > 0, 

(1(1, (A,+iA)*(7r,+iA)^')| > ^iL||(A,+iA)*(7r,+iA)|'||2) < 2exp(-uV2). (3.1) 

Recall that IItt^A : A S ^}| = IT^I < 2^ , so there are at most 2^ ^ possible values that 
Ar(A)*7rr+i(A) can assume in p.ip . Therefore, via a union bound over all these choices 
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and on all the levels i < r < (cf. (jA.2[) below), it is evident that there are constants 
Ci, C2 > for which ii t > ci then with ^-probability at least 1 — 2 exp(— C22^t^) one has for 
every £ < r < and every A & A 

|((A,+iA)^, inr+iA)^')\ < t2'-/2||(A,+iA)*(^,+iA)^'l|2. (3.2) 

Let £t{^') be the event for which p.2p holds for all the possible choices of r and A as 
above. Since 

II (Ar+.A)* (7r, + iA)|'||2 < ||A,+ iA||2^2 sup WA^'h = ||A.+ lA||2^2iV^(l'), 

one has on StU') 

7-0-1 rp- l 

5i := J2 l((A.+iA)^,K+iA)|')| < i2'^/'||A,,.+iA||2^2iV^(0 

r=i r=l 

<tl2{A,\\-h^2)NA{$'). 

We will now estimate 

nOO 

\\S,\\l = E«E«,5f = / ptP-iPs(5i > t\adt. (3.3) 

Jo 

Setting W{$,') = 72(^, || • ||2^2)A^.a(I'), observe that 

poo nOO 

/ ptP-^¥^{Si > t\^')dt < 4W{^'f + / ptP~'^P^{Si > t\^')dt 

Jo Jc^Wii') 

POO 

<4W{$')P + W{^')P / puP-^F^{Si > uW{^')\$')du < 4W{^')P, 

where C3 > ci and C4 are constants that depends only on L. Indeed, for u > ci, 

¥^{Si > uW{^')\^') < P^(£„(4')ll') < 2cxp{~C2U^2') < 2cxp(-C2iiV2)- 
Repeating this argument for 5*2 = J2r°=e^ \{{''^rA)^, (Ar+iA)^')|, it follows that 

11^1 + 52||l, < C5(L)72(A II • ||2^2)||A^.a(OIU.- 

Finally, since |{7rfA : A G .4}| < 2^ < exp(p), we conclude 

E sup i((7rM)i,(^M)Or < E Ei(ii,ior 

AeTf 

<exp(p) supE|(A^,AOr- 



AeA 



Thus II sup^g_4 |((7rM)l, (7rM)l')l lUp ^ e sup^g^ || (A^, A|')||i,p, which completes the 
proof. ■ 

With these preliminary results at hand, we can now proceed to establish moment bounds 
for the quantities in questions. 

Theorem 3.4 Let L > 1 and ^ ~ ('?j)j=i7 where ^j, j ~ 1, . . . ,n, are independent mean- 
zero, variance one, L-subgaussian random variables, and let A be a class of matrices. Then 
for every p> 1, 
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(a) 
(b) 



WNAimL, <L 72 (A II • 112^2) + dpiA) + ^d^^M), 
r^(OllL„ <L72(A II • 112^2) (72(^, II • 112^2) + dF{A)) 



+ ^d2^2{A) (72(AII • h^2) + dF{A)) + pdl^^iA). 
Proof. We start by showing that if ^' is an independent eopy of ^, then 
sup \\{Ai,Ai')\\L, <L VpdF{A)d2^2{A) +pdl_2{A). 



(3.4) 



Indeed, fix A e ^ and set S = {A* Ax : x G -02}- Since the random vector ^ is L- 
subgaussian, the random variable A* A^'') is subgaussian conditionally on Therefore, 
by (1131), 



\\{AtAi')U^ = {E,,{{E^\{tA*A^') 



< 



{E^.L^\\A*A^'\\Py/P 



= iVP IE?' sup|(y,^')r 
V yes 

Note that for a standard Gaussian vector g, 



i/p 



2\l/2 



|A*A||p < IIAIIpIIAII 



Esup|(y,g)| =El|A*Ag||2 < (E||A*Ag||2 
yes 

Also, if I' = J2'j=i ^j^3 then, applying (|2.3p again, 

sup(E|(y,Or)'/^= sup(E|(A*Az,Or)'/^<^ sup ^||A*A^||2 = Lv^||A| 
yes zeBj zesj 



2 

2->-2- 



Hence, Equation p.4p follows by applying Theorem 12.31 and taking the suprcmum over 

AeA. 

As a preliminary step, we will consider the special case of the Gaussian vector g and 
show that ENA{g) Sl J2{A, \\ ■ ||2-j-2) + dpiA), which is (a) for p = 1. To that end observe 
that by Theorem 12.51 and Lemma 13.31 



l|C^(9)ll 



< 



<, 



sup 
AeA 



sup 
AeA 



Y,g,g,{A=,A'')+J2{g]-l)\\A 



J||2 



Y,g,g',{A\A'^) 

j-k 



sup \{Ag,Ag')\ 
AeA 



72 (-4, 1 



2)l|A^^(fif)l 



+ sup \\\{Ag,Ag')\ 
AeA 



Combining p.4p with p.Sp . it follows that 

I|C^(9)IIl„ <l 72(A II • 112^2) ||A^^(g)||r + VpdF{A)d2^2{A) +pdl^2{A). 



(3.5) 



(3.6) 



Specifying p = 1 and using that dpiA) > d2^2{A) as well as E|| Ag||2 = || A.|||., we conclude 
EN^ig) < ECAig) + d%{A) <l M \\ ■ \\2^2)EN^ig) + d%{A). 
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Therefore, 



ENAig) < (EiVl(g))i/2 <^ ^^(A, II • 112^2) + dpiA), 



as desired. 

Finally, for general p and arbitrary subgaussian vectors, we apply Theorem 12 .31 with the 
set S = {A*x : x E B2, A E A}. Since ^ is L-subgaussian we obtain 

||A^^(OI|l, = (E sup \{Atx)\n'/''^{Esnp\{tu)n'^P 

<L ENA{g) + supiE\{t u)\P)'^P <LENA{g) + VP sup ||A*a;||2 
ues AeA,xeB^ 

<L ENAig) + Vpd2^2iA), 

<L 72(A II • 112^2) + dF{A) + y/pd2^2{A), 

which proves (a). 



For (b), observe that as the arc unit variance, we have E|| A^| 



1^111-= E ll^^lll 



and, consequently, Ca{0 can be split up into the diagonal and the off-diagonal contribu- 
tions as follows. 



C-4(^) = sup 
AeA 



J2 ^MA^,A')+Y,i\^,\'-i)\\A^r 



<BAii) + DA(0. 



Hence it suffices to estimate the moments of Ba{^) and -Dyi(^); one concludes using the 
triangle inequality. 

For the off-diagonal term, we use Theorem [23] and Lemma [3?3l to bound 



sup 
AeA 



3,k 



= 4 



sup 

A£A 



<L 12{A, II • ||2^2)||iV^(OI|L, + sup \\{A^,A^')U,. 

AeA 



Combining this estimate with (|3.4p and part (a), we obtain that 

l|i?^(€)llL, <L12{A, II • 112^2) (72(-4, II • 112^2) + dpiA) + Vpd2^2iA)) 
+ ^dF{A)d2^2{A) +pdl^2iA). 



(3.7) 



For the diagonal term, observe that by a standard symmetrization argument (see, e.g. 
[271 Lemma 6.3]), 



WDaIO 



sup 
AeA 



E 



(|e,f -E|e,f)||A^||2 





< 2 


sup 






AeA 




Lp 





'\\A=\\', 



where e = (ei,...,e„) is a Rademacher vector independent of ^. Furthermore, let g = 
(51,..., (7„) be a sequence of independent standard normal variables. Then, as is L- 
subgaussian, there is an absolute constant c for which P(|^jP > tL^) < cP((7| > t) for 
every t > 0. Moreover, fjl^jp and ejg^ are symmetric, so by the contraction principle (see 
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|271 Lemma 4.6]), a rcscaling argument, and de-symmetrization |27[ Lemma 6.3], 



< 2 



snp\J2^s9-\\Aml\ 
AeA ^ 



AeA 



sup I^CjIlA 



J||2| 



AeA 



2\\DAg)\\r 



sup \ ^ej\\A 
AeA , 



Now observe that D^{g) < CA{g) + Bji^{g), and thus, by (|3.6p and p.7p . 

<72(A II • 112^2) (72(A II • 112^2) + dpiA)) 

+ \/pdF (-4) ((i2-i.2 (-4) + 72 ( A, 1 1 ■ 1 1 2-i.2 

Finally, note that A J^j 1 1 A"' II 2 is a subgaussian process relative to the metric 

1/2 

d{A,B)= \ Y.{\\A^\\i-\\B^\\iy' 



1/2 



< I ^ II - B^ll^ • (II A^||2 + ||B^||2)' 1 < 2dAA)\\A - B||2^2. 



Therefore, by Theorem 12.31 and a standard chaining argument. 



sup I VcjllA- 
AeA , 



< 



dF(A)72(A, 



12^2 



This shows that 

\\DAm\L, <L72(A, 11 • 112^2) (72(A II • 112^2) + dF{A)) . 

+ ^/v<^2^2 (-4) (72 ( A, 1 1 • 1 1 2^2 ) + dF ) + vdl^2 {A) , 

which, together with p.7p . proves (b). H 

Remark 3.5 {a) Observe that Theorem \3.1\ can be deduced from Theorem \3.4\ using 
Proposition \2.6[ 

(b) In the Rademacher case, once the bound for the expectation is derived, one may 
alternatively deduce the tail bound from the concentration inequality in JSl Theorem 
17], see also g^/. 

(c) In the Rademacher case, one has = 0, so the contraction principle and the more 
sophisticated decoupling inequality for Gaussian random variables are not needed in 
the proof. 

(d) Note that the assumption that ^ has independent coordinates has only been used in 
the decoupling steps of the proof of Theorem [X 
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4 The Restricted Isometry Property of Partial Ran- 
dom Circulant Matrices 



In this section wc study the restricted isometry constants of a partial random circulant 
matrix $ E M™ ^ " generated by a random vector ^ = (Ci )"=! ; where the 's are independent 
mean-zero, L-subgaussian random variables of variance one. Arguably the most important 
case is when ^ = e is a Rademacher vector, as introduced in Section [1.21 

Throughout this section and following the notation of the introduction, let V^z = 
■^^Pn{x * z), where the projection operator Pjj : C" — > C" is given by Pn = iij^iifi, that 
is, {Pnx)i = xe for £ e fl and {Pnx)i = for £ ^ fi. 

Recalling that Ts = {x e C" : ||a;||2 < 1, \\x\\o < s}, the restricted isometry constant of 
* is 

Ss= sup \\\Ra{i^x)f-\\x\\l\ = sup |||Po(=r*6f -ll^ll'l - sup \\\V^m-Ml\- 

xGTs xGTs xeTs 

Since Ifll — m, it follows that 



^ n ^ n 

and hence 



^ = \\x\\l 



m ^ — ' ^ — ' m 

ten kj=i e£nk=i 



Ss - sup wvMi-nvxmi 

which shows that Sg is the process C4 studied in the previous section for ^ = { V^j, : x G Tg}. 
Hence the tail decay can be analyzed using Theorem 13. II 



Theorem 4.1 Let ^ — be a random vector with independent mean-zero, variance 

one, L-subgaussian entries. If, for s < n and rj, S e (0, 1), 

m> C(5"^smax{(logs)^(logn)^, log(7]~^)} (4.1) 

then with probability at least 1 — rj, the restricted isometry constant of the partial random 
circulant matrix $ G ]gnix7i g^jipj-ated by ^ satisfies Sg < S. The constant c > depends 
only on L. 

The proof of Theorem 14.11 requires a Fourier domain description of Let F by the 
unnormalized Fourier transform with elements Fjk = e^'^*^'^/". By the convolution theorem, 
for every I < j < n, F{x * y)j = {Fx)j ■ {Fy)j. Therefore, 

= -^PnF-^XFt 



where X = diag(i^a;) is the diagonal matrix, whose diagonal is the Fourier transform Fx. 
In short, 

= -^PnXF, 

vm 

wheve Pn = PnF-\ 

Proof of Theorem 14.11 In light of Theorem 13.41 and Theorem 13. 1[ it suffices to control 
the parameters ^2^2(^4), dp{A), and 72(.4, || • ||2-i.2) for the set A = {Vj, : x G Tg}. 

Since the matrices Vx consist of shifted copies of x in all of their m nonzero rows, the 
£2-norm of each nonzero row is m^-'^/^||a;j|2; thus ||V^||f = 1135112 < 1 for all x £ Tg and 

dpiA) = 1. 
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Also, observe that for every x € Ts with associated diagonal matrix X, 

1 _^ pfT 1 

||V,||2^2 = ^\\PnXF\\2^2 < \ -\\PnF-'\\2^2\\X\\2^2 < ^\\X\ 



m y m 

1 



m 



FxU. (4.2) 



Setting ||a;|ls5 := ||Fa;||oo it is evident that ||Fa;||oo < Ha^Hi < \/s||a;j|2 < \/s for every 
X £ Tsi and thus 

Next, to estimate the 72 functional, recall from (|2.ip that 

72(A|| • 112^2) < / log'/^ N{A,\\ ■ \\2^2,u)du. 



By 

\\V^-Vy\\=\\V^^y\\<m-'/^x-yU, 

and hence for every u > 0, N{A, \\ ■ \\2^2,u) < A^(Ts, m~-^/^|| • ||s5,w). Using an argument 
due to Carl [TUl Prop. 3], (see also [IT] or [371 Lemma 8.3]), and setting A = ^ s/m, it is 
evident that 

log7V(T„m-i/2|| . <\ogN{s'/^B'l,m''^''\\ ■ U,u) 



< 



^X2 



log2(nwVA2) 



Since is the union of s-dimensional Euclidean balls, a standard volumetric argument 
(see, e.g., [H] or [371 Chapter 8.4]) yields 

log^(T„m-i/2|| . iigjj^y) <s\og{en/su) 

(which is stronger than the bound above for u < 1/^ym). 

Combining the two covering number estimates, a straightforward computation of the 
entropy integral, (see also [41] or [ST] eq. (8.15)]), reveals that 

72(A II • 112^2) < y^(logs)(logn), 

which implies that 72 (^, || • Il2^2) ^ <5 for the given choice of m. 

Now, by choosing the constant c in (|4.ip appropriately (depending only on L), one 
obtains 



where E and ci are chosen as in Theorem 13. II Then Theorem 13.11 vields 

F((5^ >S)<F{Ss> ciE + 5/2) < exp{-C2{m/s)d^) < 77, 

which, after possibly increasing the value of c enough to compensate C2, completes the 
proof. ■ 
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5 Time-Frequency Structured Random Matrices 

In this section, wc will treat the restricted isometry property of random Gabor synthesis 
matrices, as described in Section [T751 

Theorem 5.1 Let ^ = (^j)"^]^ be a random vector with independent mean-zero, variance 
one, L-suhgaussian entries. Let ^'^ G £imxm random Gabor synthesis matrix gen- 

erated by h = If, for s £N and 5,rj <E (0, 1), 

m > c(5~^smax{(log^ s)(log^ m), log(77~"^)} 

then with probability at least I — rj the restricted isometry constant of satisfies that 
(5s < (5. The constant c > depends only on L. 

Before presenting the proof we will need several observations. First, note that for x G C™, 
= Kc^, where the m x m matrix is given by 



= a;A7r(A). 



m 



It is straightforward to check that {--^7r(A) : A e Z^} is an orthonormal system in the 
space of complex m x m matrices endowed with the Frobenius norm. Therefore, 



Hence, if T, = {x e C™' : |la;|l2 < 1, ||a;||o < s} and A ^ {V^ : x e TJ, the restricted 
isometry constant is 

S,= sup\\\^hx\\l-\\x\\l\^ sup \\\VMl-nVMl\- 

Thus Theorem 13.11 applies again and we need to estimate the associated Dudley integral. 
Note that, as 7r(A) is unitary, one has for x G Tg 

\\V^\\2^2 < ^ V I^aI |k(A)||2^2 < \\x\\l/V^l < ^/T/^Wxh, (5.1) 
* /7rj ^ — ^ 



m 



so the upper integration limit will be ^2-5-2 (-4) < 

Lemma 5.2 There exists an absolute constant c such that for every < u < ^/^, 
log N{A,\\ ■ 112^2, w) < cs (log(emVs) + log(3v/^/u)) , 

and 

, w , 1 1 1 1 X s log m 

l0giV(A|| • 112^2, <C 

Before proving the lemma, let us recall the following simple modification of the Maurey 
Lemma, essentially due to Carl |10j . For convenience, a proof is provided in the appendix. 
Below, for a set U m a vector space, coiiv{U) denotes its convex hull. 

Lemma 5.3 There exists an absolute constant c for which the following holds. Let X 
be a normed space, consider a finite set U d X of cardinality N , and assume that for 
every L € N and (iti, . . . ,ul) S , E^\\ ^j'^jWx ^ A\/X, where (ej)j'^i denotes a 

Rademacher vector. Then for every u > 0, 

\ogN{conv{U), II • ||x,m) < c{A/uflogN. 
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Proof of Lemma 15.21 Define the norm || • || on M" by = ||Vi||2->2i fix 5* C of 
cardinality s and put ,65 = {x E C™ : ||a;||2 < l,supp(a;) C S}. Tlien, by (|5.ip and a 
volumetric estimate, 



NiBs,\\ ■ \\,u) < NiBs,y/Wi\\ ■ \\2,u) < 1 + 



\/ s/m \ I \/s I m 



where the last step uses that u < \J sjm. Since there are at most (™ ) < [ern^ / sY such 
subsets S of Z^, the first part of the claim follows. 

To prove the second part, note that Ts C \/2s (conv(eA, ie^, — e^, —ie\)x^22^ . Consider 
{uj)^^i selected from the extreme points (with possible repetitions). Then, by the non- 
commutative Khintchine inequality, due to Lust-Picard, [551 HH HO] ; 

^e.K, 112^2 < v/bi^max<' || ^ K.VJJI 2^2, |1 112^2 

Recall that for every such Uj, Vu^ ~ aTr{X) with \a\ = \j2sjm. Therefore, 'V^Yuj 
V^.V*^ = {2s /m) I and thus 

L 

Esll ^£jK, ||2^2 < \J\o%m^ sj-m-jL. 
i=i 



Applying Lemma 15.31 for A ^ -^/s/m^/logm, it follows that 

S log^ TO 



logiV(r„||.||,u)<(A/u)^log(m^) 



< 



Proof of Theorem 15.11 The proof follows an identical path to that of Theorem W. 
First, as was noted above, dF{A) < 1 and d2->2(-4) < ^ s/m. Also, using the bound (|2.ip 
by the Dudley type integral and by a direct application of Lemma 15.21 

2{A} 



72(A II • 112^2) < / 0ogA^(AII • lu)du < V^(logs)(logm). 



Here we used the first bound of Lemma [5?^ for u <m , the second bound for u> m ^1"^ . 
The claim is now a direct application of Theorem 13.11 ■ 



Remark 5.4 The only properties of the system {7r(A) : A G Z^„} that have been used in 
the proof are the facts that all 7r(A) are unitary and that {m^^/^7r(A) : A G Z^„} is an 
orthonormal system with respect to the Frobenius inner product. Therefore, Theorem \5.1\ 
also holds true for general systems of operators with these two properties. 



A Appendix 

A. 1 Proof of Theorem [2731 

Without loss of generality assume that T is finite. Fix an admissible sequence (T^) of T, 
let 7rr(i) e Tr, for t G T, be an element in T^ with the smallest €2-distance to t, and choose 
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I for which 2^ ^ < 2p < 2^. Since one may assume that 7rr(t) = t for a sufficiently large r, 
one has 

oo 

sup|(t,y)| <sup|(7r,(t),y)|+sup^|(7r,+i(t)-x,(t),y)|. (A.l) 
The p-th moment of the first term can be estimated as 



Esup|(^,(t),r)|M < E^|(t,r)|M < (|t,|)1/^' sup (E|(t,r)n 



teT J \ ^^^^ ) t^T, 

< (22')i/Psup {E\{t,Y)\P)^/P < 16sup(E|(t,y)|P)^/^, 
teT teT 

where the last inequality follows from the choice of p. 

Since ^ is an L-subgaussian vector, one obtains for the second term in (jA.l[) 

CCXD CXD \ 

sup ^ I {TTr+l it) - TTrit), Y) | > ^ 2-'^\\ {{^r+l{t) - TT, (t) , O;, ) 1 1 2 

r=l r=e ) 
oo I m 

E E IP iE^^<*-*''^^)i^"^2''^'iK*-*''^^)^=ii 

00 

^ ^22"+' . 2^'' exp(-2'"uV2) < 2exp(-2V/4) < 2exp(-pwV2), (A.2) 
when u > c for an appropriate choice of c (independent of i). Therefore, by integration, 
Esup^ - 7r,(t),y)r <i ^2'-/2|l((^,+i(f) - nr{t),x,))f^,h 



<L 72(TM| • II2) 



where T' = {((^^ ■ i ^ By the majorizing measures theorem, 

m m 

72 (T', II • II 2) < E sup I V I = E sup I V 5, (t, a;,) I = E sup | (t, G) \ , 
which yields the claim. 



zeT' teT teT 



A.2 Proof of Lemma 15.31 

If a; £ conv(W) then x = X^jLi (^j'^j with dj > 0, X^jLi = 1- Let Z £ X he a. random 
vector which takes the value Uj with probability 9j for j = 1,...,A^ and thus satisfies 
KZ = X. Let L be a number to be determined later, set Zi, . . . , Zl be independent copies 
of Z, and put 

If (ijYj'^i is a Rademachcr sequence independent of {Zj) then by a standard symmetrization 
argument (see, e.g., [27]) and because {Zj)^^^ ranges over 

^ L 2 ^ 

Ella; -Y\\x = -E|| ^(a; - Z,)||x < ;^E|| ^ e,Z,|U < 2A/Vl. 
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Thus, for L ^ [A/uY there exists a reaUzation y = Sj"=i -^f i for some eU, oiY for 
which 

- y\\x < 

As this argument apphes for any x g conv(Zi), any such a; can be approximated by some 
y of this form. Since y can assume at most different values, this yields 

logAf(conv(Z^),|| • \\x,u) < LlogN < c{A/uf\ogN, 

as claimed. 
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